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Abstract 

Pro-social punishment, whereby cooperators punish defectors, is often suggested as a 
mechanism that maintains cooperation in large human groups. Importantly, models 
that support this idea have to date only allowed defectors to be the target of punish- 
ment. However, recent empirical work has demonstrated the existence of anti-social 
punishment in public goods games. That is, individuals that defect have been found 
to also punish cooperators. Some recent theoretical studies have found that such anti- 
social punishment can prevent the evolution of pro-social punishment and cooperation. 
However, the evolution of anti-social punishment in group-structured populations has 
not been formally addressed. Previous work has informally argued that group-structure 
must favour pro-social punishment. Here we formally investigate how two demographic 
factors, group size and dispersal frequency, affect selection pressures on pro- and anti- 
social punishment. Contrary to the suggestions of previous work, we find that anti- 
social punishment can prevent the evolution of pro-social punishment and cooperation 
under a range of group structures. Given that anti-social punishment has now been 
found in all studied extant human cultures, the claims of previous models showing the 
co-evolution of pro-social punishment and cooperation in group-structured populations 
should be re-evaluated. [This is a post-print of an accepted manuscript published in 
Journal of Theoretical Biology 311 (2012) 107-116. The publisher's version is available 
from [http: / /www.sciencedirect .com/science/article /p ii/S002251931 200344X] 

Keywords: anti-social punishment, equilibrium selection, public goods, behavioural 
economics, strong reciprocity 



1. Introduction 



Understanding the evolution of indi vidually-co st ly co o perative behav i ours is a ma- 
jor focus of social evolution theory (iHamilton . 1964; Iwilsonl. Il975t iFrankl . Il998t 



Lehmann and Kellerl 120061 IWest et all l2007t lArchetti and Scheming! . l2012h . It is now 



* Corresponding author 

Email addresses: Simon. PowersOunil . ch (Simon T. Powers), D . J . Taylor Obath . ac . uk (Daniel J. 
Taylor), J.J.Bryson@bath.ac.uk (Joanna J. Bryson) 

1 Present address: Department of Ecology & Evolution, Biophore, University of Lausanne, CH-1015 
Lausanne, Switzerland 

Preprint submitted to Journal of Theoretical Biology March 8, 2013 



widely appreciated that cooperative behaviours can evolve if they provide either a direct 
fitness benefit to the actor during its lifetim e, or if they provide an indirect fitness benefit 
by helping other cooperators (e.g. relatives, Hamilton . 1964 ; Lehmann and Keller . 20061 



West et all 120071) . A great deal of current research is aimed at understanding the bio- 



logical mechanisms that provide direct or indirect fitness benefits in different scenarios 
( Hammersteinl . 2003 : West et al. . 2007 ). In particular, identifying the mechanisms that 
provide direct or indirect fitness benefits to cooperators in large human groups, where ge- 
netic relatedness is typically low, remains an open challenge. This question has received 
much attention, due to the fact that humans appear to cooperate on a much larger scale 
than other species, and thus large-scale cooperation seems to be one of the properties 
that make human sociality unique. 

Cooperation in humans is often framed in terms of the production and sharing of 
various public goods. This may take the form of, for example, sharing food or information 
with other group members, or contributing time, energy, and resources to a group project. 
Throughout this paper, we focus on social dilemmas that take the form of linear public 
goods games^i In linear public goods games (PGG) there is an apparent individual 
advantage to defection, that is, to reaping the benefits of the public good without paying 
the individual costs of contributing to it. We would expect such defectors to be fitter 
than cooperators within the same group, and hence for natur al selection to lead to the 
breakdown of cooperation in a "Tragedy of the Commons" ([Hardinl . 1968). However, 
we see that cooperation is nevertheless maintained in large human groups despite the 
apparent advantage of defection. Explaining this is problematic because many cases of 
cooperation in humans occur in large groups of unrelated individuals. 

Punishment behaviours have been widely suggested as a solution to this quandary. In 
particular, cooperative individuals may have the option of punishing defectors. Typically, 
punishment takes the form of an actor paying a cost by reducing its own fitness in order 
to reduce the fitness of the punishment target, as in the following fitness functions: 



Wd = 1 + Bx p — Px p 
w p = l 



Bx p — C — Kxd, 



(1) 
(2) 



where Wd is the fitness of defectors, and w p the fitness of pro-social punishers (individuals 
who cooperate and then punish any group member that defected). In these functions B 
is a constant representing the benefit that a single cooperator provides, C the cost of 
cooperation, P a constant representing the cost of being punished, and K the cost of 
punishing an individual. The constant 1 represents a baseline fitness in the absence of 
social interactions. 

The proportions of defectors and pro-social punishers within a group are denoted by Xd 
and x p , respectively. The relative cost of P and K is subject to some debate, however, 
empiric al evidence from PGG experime nts show similar effects for a range of relative 
values ((Anderson and Puttermanl . 120061 ) . In the present paper we assume that the cost 
of being punished is partially distributed, and depends upon the proportion of punishers, 
not the absolute number, in a group. This is a common assumption in models of the 
evolution of punishment ( Bovd and RichersonL 1992 ; Bovd et al. , 20031 : Lehmann et al 



2 See lArchetti and Scheurind [120121) for a discussion of the differences between linear and non-linear 
public goods games and how these affect the selective dynamics of cooperation. 
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2007|). We assume that an individual has a fixed amount o f time and energy that can 



be spent on acts of punishment, and also (as in prior work, Bovd and Richersonl 1992 ; 



Bovd et all 120031 ) that each punisher punishes every available target in its group, or at 



least that their cost of punishment is proportional to their probability of encountering 
punishment targets. The more individuals there are for an actor to punish, the less 
effort the actor can exert on punishing any one individual, and hence the less absolute 
damage is inflicted per punisher per target. Note that in most literature, punishment 
targets are taken to be defectors, but in the case of anti-social punishment targets may 
be cooperators. 

In the above typical model, when pro-social punishers are sufficiently common within 
a group, it is individually advantageous for potential defectors to switch behaviour to 
cooperating. The condition for this is Px p > Kxd + C, that is, when punishers are in 
sufficient frequency that the cost of punishing an d cooperating (i . e. be ing a pro-social 



punisher) is less than the cost of being punished (jLehmann et all 12007) . Thus, in this 



simple model the evolutionarily stable states are either pro-social punishment at fixation, 
or defection at fixation. Note that this result holds even in a well- mixed population, and 
so w ould be applicable to large groups of unrelated individuals ( Bovd and Richersonl 



1992). 



There are, however, at least three problems with this as a mechanism for the mainte- 
nance of cooperation in human groups. The first is that because both pro-social punish- 
ment and defection are equilibria, why w ould we expect natural selection to lead to one 
rather than the other (jBovd et ah . 2003)? It has been argued that group selection (in 
the broad sense, lOkashal l200fil) should prom ote the pro-social punishment equilibrium 
(|Bovd et all l20oi " iBowles and Gintisl 1 2004) , because this increases the mean fitness of 
group members. The second problem is that pro-social punishment may not actually 
be stable under mutat ion if cooperation and punishment are not perfectly linked traits 
(jLehmann et all 120071 ). This is because pro-social punishers may be slowly replaced 
by non-punishing cooperators that do not pay the cost o f punishing, an d who are not 
themselves punished (the second-order free-riding problem. IColmanl . 120061 ) . As pro-social 
punishers decline in frequency due to the accumulation of non-punish mutations, defec- 
tion may again become advantageous. The third problem is that there is no reason to 
suppose that only defectors can be targets of punishment: defect ors may have the optio n 
of punishing cooperators, as evidenced by recent empirical work ( Herrmann et al1 . l2008l) . 
Exploring the consequences of this third problem, termed anti-social punishment (ASP) 
motivates the present invest igation. 

Recent theoretical work ( Rand et al. . 2010t Rand and Nowak . 2011 ) has suggested 
that anti-social punishment can thwart the evolution of pro-social punishment and coop- 
eration. However, no study has formally addressed anti-social punishment in group- 
structured populations, even though such population structu res are frequently mod- 
elled when considering the evolution of pro-so cial punishment ( Bovd et all 20031 2010t 



Bowles and Gintisl . l2004lLehmann et al. . 20071 ). Group-structured populations are an es- 



sential component of cultural group selection, and of rece nt arguments about punishment 



jromoting group-beneficial cultural norms in humans (jBovd et all 120031 : iGintis et al 



pn 

l2003tlMathew and Bovdll201ll:ISober and Wilsonl . ll998l ). Thus such populations warrant 
an explicit model in order to determine whether the inclusion of anti-social punishment 
affects the claims of these works. For example, some previous work has verbally argued 
that the presence of group-structure should be expected to favour pro-social punishment 
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even when anti-social punishment is present ( Rand et al. . 2010t Rand and Nowakl[20ll . 
According to this view, we might not expect anti-social punishment to have any effect 
in group-structured populations. Clearly, the implications for the understanding of the 
proximate mechanisms promoting cooperation in humans motivate a formal examination 
of this case. 

In this paper, we formally address the evolution of anti-social punishment in group- 
structured populations, focusing on the conditions under which it prevents the evolution 
of pro-social punishment. We also consider the effects of non-punishing cooperator and 
defector mutations. 



2. A model of the evolution of pro- and anti-social punishment in group- 
structured populations 

We consider here a population of individuals that live and reproduce in social groups 
for a number of generations. In each generation, the following two- phase social inter- 
action occurs and determines fitness within the group. Firstly, each individual either 
cooperates by paying an individual cost to contribute to a public good, or defects by 
not contributing. All group members then receive an equal share of the benefit of this 
good, regardless of whether they cooperated in its production or not. Then, in the sec- 
ond stage individuals have the option of punishing other group members, based on how 
they behaved in the first stage. We model here the evolution of four behavioural strate- 
gies: cooperate but do not punish (non-punishing cooperate); defect but do not punish 
(non-punishing defect); cooperate and punish all individuals that defected (pro-social 
punisher); defect and punish all individuals that cooperated (anti-social punisher). The 
fitness of these four types within a single group is given, respectively, by the following 
fitness functions: 

w c = 1 + B(x c + x p ) - C - Px a (3) 
w d = 1 + B{x c + x p ) - Px p (4) 
w p — 1 + B(x c + x p ) - C - K(x d + x a ) - Px a (5) 
w a = 1 + B(x c + x p ) - K(x c + x p ) - Pxp. (6) 

Here, x c , Xd, x p and x a are the proportions of non-punishing cooperators, non- 
punishing defectors, pro-social punishers and anti-social punishers, respectively. As in 
the previous section, B is a constant representing the benefit that a single cooperator 
provides, and C is a constant representing the cost of cooperation. K and P are con- 
stants representing the cost of punishing and being punished, respectively. The constant 
1 represents a baseline fitness in the absence of social interactions. These fitness functions 
are based on thos e commonly used to mo del the evolution of pro-social punishment (e.g. 



Bovd et all 12003c iLehmann et all 120071 ). with the addition here of anti-social punish- 
ment. Note that in this model, pro-social punishers punish both types of defectors, i.e. 
non-punishing defectors and anti-social punishers. Likewise, anti-social punishers punish 
both types of cooperators, i.e. non-punishing cooperators and pro-social punishers. 

The number of individuals of type i, nj, within a group changes deterministically each 
generation according to the following difference equation: 



m(t + 1) = rii{t) + ni(t)wi, 
4 



(7) 



where t refers to the current generation, and Wi is the fitness of type i within the group 
at the current generation, as given by Equations [3H6J Fractional parts are maintained 
throughout. We assume that reproduction is asexual, and that genotypes are haploid 
with a single locus determining behaviour. Note that in Equation [7] all individuals of the 
same type (strategy) within any one group have the same fitness, and reproduce by the 
same amount. Thus, within groups we do not need to explicitly track each individual, 
but can simply track type densities. 

The above fitness functions describe social interactions within single groups. At the 
metapopulation level, we model groups formed by random sampling of n individuals from 
a global migrant pool of size N. This sampling is done without replacement, according 
to a multivariate hypergeometric distribution. Reproduction and selection then occurs 
deterministically within these groups for T generations, according to Equation [7] Note 
that there is no local density regulation in Equation [7J thus, different groups may grow 
to different sizes over the course of the T generations. After T generations dispersal 
occurs (ecologically, dispersal could be triggered by depletion of a resource patch, for 
example). During the dispersal stage, a new migrant pool is formed by summing the 
absolute type densities across all groups. Groups that have grown to a larger size will 
make up a larger fraction of the new pool, representing a form of global competition 
between groups. Global population regulation then occurs by proportionality rescaling 
the migrant pool back to size N. The number of individuals of a type after population 
regulation is computed by calculating the proportion of that type in the migrant pool 
and multiplying by N, rounding the result to remove fractional parts and produce an 
integer number of individuals. 

Each individual in the migrant pool undergoes mutation with probability fi. If chosen 
for mutation, an individual's genotype is changed randomly to one of the other three 
types with equal probability. The individuals in the migrant pool then form the next 
generation of groups (i.e. colonise a new set of resource patches), as previously described. 
This process of group formation and dispersal then continues for a number of cycles, G. 
We simulate the model by the following procedure: 

1. Initialisation: form a migrant pool of N individuals with N c non-punishing co- 
operators, Nd non-punishing defectors, N p pro-social punishers and iV a anti-social 
punishers. 

2. Group formation: form [N /n\ groups of size n by random sampling from the mi- 
grant pool without replacement (where [J denotes the mathematical floor function). 

3. Reproduction and selection within groups: iterate equation[7]T times for each 
group (see text). 

4. Dispersal: all individuals leave their groups to form a new migrant pool. The 
migrant pool is then rescaled back to size N, keeping the proportion of types the 
same. Fractional parts are rounded. 

5. Mutation: each individual in the migrant pool undergoes mutation with probability 
// (see text). 

6. Iteration: repeat from step 2 for G cycles. 

The population structure described above is b ased on the Haystack model 
dMavnard Srnithl Il964t ICohen et all Il976t IWilsonl. Il987t ISober and Wilsonl . Il998t 



Bergstroml 120021: iFletcher and Zwickl . 12004 l2007t iPowers et all 120111 ). where we allow 
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both the size of social groups when they are founded, and the frequency of dispersal, to 
be parameterised. When T = 1 and dispe rsal occ ur s ever y gen eration, this cor r esponds 



Hamilton 



1975; Michod 



to the well-studied trait-group model o f Wilso nl. (119751: c.f. , 

19831: iNunnevI 119851 iMavnard Smith and Szathmarvlll995t IPepperl l200d: bkashal l2006t 
Santos and Szathmarvl l2008). This provides us with a simple model that allows the eco 



logical and demographic factors affecting between-group v ariance, and hence the strength 



of group selection, to be varied ( Sober and Wilsonl . 1998). 



3. Results 

In this section we show how two demographic factors, group size and dispersal fre- 
quency, affect whether selection will favour pro- or anti-social punishment, or no pun- 
ishment at all. We first present analytical results from a simpler version of the model in 
which pairs of strategies compete, and dispersal frequency is fixed with grou ps disp ersing 
after every generation. This population structure corresponds to Wilson's (|1975l ) trait- 
group model. It also parallels the anonymous single-shot public goods games used in 
behavioural econo mics experiments, particularly those used to study strong reciprocity 
20021 ). We then go on to present results from the full simulation model 



(Fchr etal 



in which all four strategies are present simultaneously, and both dispersal frequency as 
well as group size is varied. Less frequent dispersal (T > 1) corresponds to a Haystack 
population structure, and is an evolutionary analogue of repeated economic public goods 
games. 

3.1. Analytical results 

We explore first the evolutionary dynamics where only two strategies are present in the 
population at any one time, and where dispersal occurs every generation (T = 1). We first 
note that the fitness of each type is frequency-dependent (Equations [3] [5]). In particular, 
the fitness of both pro- and anti-social punishers is positive frequency-dependent. This is 
because the total cost of punishing decreases as punishers become more common (since 
less acts of punishment are required), while the total cost of being punished increases 
(due to more individuals performing punishing acts). Thus, although neither type of 
punishment may easily be able to invade from rarity (thou gh see Discussiom Sect ion [j]), 



they can be selected for from higher initial frequencies (jLehmann et all [2007). We 



therefore focus on how the initial proportion of a strategy required for it to be selectively 
favoured changes with respect to group size. In addition to the analytical results, we 
verified each threshold frequency derived below numerically in the simulation model. 

3.1.1. Non-punishing cooperators vs. non-punishing defectors 

We first consider pairwise competition between non-punishing cooperators and non- 
punishing defectors. The fitness of each type, given a group containing j cooperating 
co-players, is: 



D 



C 



3 B 
B- + — 



D 



C + l 



(8) 



To calculate whether cooperation increases or decreases in proportion, we need to know 
the probability distribution P n (j) for an individual to be placed in a group with j cooper- 
ators, given that x c is the global frequency of cooperators. In our model, we assume that 
groups are formed by a random sampling process. In the full model, this sampling occurs 
without replacement from a finite population according to a hypergeometric distribution. 
However, for ease of analysis we derive the analytical results in this section by sampling 
groups from an infinite population with replacement; this does not significantly alter the 
qualitative behaviour of the results. We thus model group formation in this section by 
sampling from a binomial distribution. 

Let g c and gd be the fitness of the cooperators and defectors after the groups have 
been formed. We can calculate them by 



5>»(iH(j) 

n-1 , _ i\ 

2( , H{l-x c ) n -^w c {j) 



J 



n n 



g d = £>„Cj>«j(i) 

= J2( n ^ j 1 ) x c( 1 - x c) n - 1 - 3 w d (j) 

3=0 ^ 3 ' 

n- 1 
= x c B hi, 



(9) 



where x c is the global proportion of cooperators. We see that cooperation increases in 
frequency in the population when g c > gd, which occurs when 



*>C. 
n 



(10) 



This is a standard result for the evolution of weak altruism in linear public goods games, 
i.e. social traits that increase the absolute fitn ess of the actor but increase the absolute 
fitness of o t her g roup members by even more dWilsonl . I1975L I1979L Il99dt iNunnevl Il985t 
Szathmarvl . 1201 If) . Because all group members receive the benefit of cooperation, includ- 
ing the actor, then cooperation is selected for whenever the actor's share of the benefit 
exceeds the cost to itself. This depends on group si ze and is give n by B/n > C (Fig- 
ure[Ta|). This type of cooperation corresponds to what Pepper ( 2000h terms a whole-group 
trait. Biol ogical examples of such whole-group traits include siderophore production in 



bacteria ( Griffin et aL . 20o3) . and the efficient use of shared resource s thro ugh lower 



consumption rates (jPfeiffer et all . l200ll: iKreftl . 12004 iKillineback et all. l2006h. Whole- 
group traits also para llel the setup of economic public goods games (jFehr et al. . 2002 ; 



Herrmann et all 120081 ). For such traits a smaller group size favours cooperation because 



a cooperator receives a larger share of the benefits of its actions. In this model whenever 
B/n > C, cooperation fixes in the global population irrespective of starting frequency, 
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assuming deterministic evolution. For B/n < C defection fixes. An equivalent way of 
thinking about this is that for whole-group traits (such as public goods), the relatedness 
of actor to reci pients in this population structure is 1/n, assuming an infinite global 
population size (jPepper . l2000h . Relatedness here is take n to mean a genetic correlation 



betw een actor and recipients at the locus for cooperation (jFoster et al.l . |2006; West ct al 



2007 ). The condition B/n > C is then an instance of Hamilton's rule (jHarnilto'n . 1964fl 
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Figure 1: Analytical results of pairwise competition between strategies. Groups are 
formed by binomial sampling from the global population and disperse every generation 
(trait-group model). For this plot, parameters other than group size are fixed to the 
values used later in the simulations (B=0.9, C=0.1, K=0.1, P=0.5), c.f. main text, 
a) Non-punishing cooperators vs. non-punishing defectors, b) Pro-social punishers vs. 
non-punishing defectors, c) Anti-social punishers vs. non-punishing cooperators. d) 
Anti-social vs. pro-social punishers. 



3.1.2. Pro- social punishers vs. non-punishing defectors 

We next investigate the evolutionary dynamics of pro-social punishers compared to 
defectors. Recall that pro-social punishers are cooperators that identify individuals that 
defected, and then pay a fitness cost K to reduce the fitness of defectors in their group 
by P. From this we can derive the fitness of both strategies, given that they are placed 
in a group containing j pro-social punishers 

w p (j) = B i±l-C-K n -^ + 1) +1 

n n 

w d (j) = (B-P)i + 1 (11) 



As before, we assume that the distribution of strategies to groups is binomial; V n (j) is 
then the probability of being placed in a group with j pro-social punishers, given that 
groups are of size n. From this, we calculate the fitnesses of each individual: 



9p = ^2 V n(j)w p {j) 
n-l / _ i\ 
i— n \ J J 



3=0 

Xp[B + K) r !^l + B_ c _ K n-l 
n n n 



9d 



3=0 

n-l / _\\ 

j=0 V J / 

71 — 1 

= x p (B-P) + 1, (12) 

n 

where x p is the global frequency of pro-social punishers. Pro-social punishment increases 
in proportion when g p > ga, or when 

Cn - B K 
x p >x p- ( n -l)(P + K) + P + K 1 [16) 

where x* is the proportion of punishers needed in the population for punishment to be 
selectively favoured, x* is horizontally asymptotic, with respect to group size n, with 
the asymptote at a;* = jjqrg ■ Figure [TBI plots this critical value as function of n, the 
group size, using the parameters (B — 0.9, C — 0.1, K — 0.1, P — 0.5) that we later 
use in the simulations; we discuss robustness of the numerical results with respect to 
these parameters in Section [5751 This result agrees with Equation 11 of iLehmann et aT 



( 2007t ) (noting that we include the effect of a punisher on itself) for the case of randomly 



formed groups that disperse every generation, and applies where pro-social punishment 
and cooperation are perfectly linked traits. We relax these assumptions in the simulation 
model (Section[ 



3.1.3. Non-punishing cooperators vs. anti-social punishers 

We now turn to investigate the dynamics of anti-social punishers in competition with 
non-punishing cooperators. In particular, we focus on whether anti-social punishment 
can prevent the invasion of cooperation even when B/n > C, i.e. even when the actor 
gains in absolute fitness terms from cooperating. The fitnesses in a group containing j 
cooperators are: 

Wc{]) = B l±±^c-p n - {j + 1) + i 

n n 
w a {j) = {B-K) ] - + l (14) 



Let g c and g a be the fitness of non-punishing cooperators and anti-social punishers re- 
spectively. Again let P n (j), the probability of an individual being placed in a group 
with j cooperators, have a probability mass function of a binomially distributed random 
variable. Then 



£>»(j>c(7) 

n-l 



E 



n-l 

j 



x J c (l - x c ) 



: {B + P) 



n—l-j, 



n-l B 



n - 1 



-C-P 
n n 



5>„0> a (j) 



3=0 



J 



,{B-K) 



n — 1 



(15) 



where x c is the global proportion of cooperators. Cooperation thus increases in propor- 
tion when: 

Cn-B P 
Xc>X <= (n-l)(P + K) + p-TH- (16) 



Critically, we see that cooperation can also decrease in proportion, even when B/n — C > 
0. Figure [Tc] plots the critical value x* against group size n. This has a horizontal 
asymptote at x* = pf^ ■ 



3.1.4- Anti-social vs. pro-social punishers 

Finally, we consider the dynamics of a population containing both pro- and anti-social 
punishers. As before, we let w p (J), and w a (j) be the fitness of a pro-social punisher and 
an anti-social punisher in a group of size n, containing j pro-social punishers (excluding 
self). Then 

Wp (j) = B-±l-C-{P + K)--^- + l 

n n 

w a (j) = {B-K-P) 3 - + l (17) 

iB 



Let g p and g a be the fitness of each type after they have been placed into groups. The 
groups are again formed binomially. Then, 



9 P = 



( n : 1 )^(i-^r 1_ so") 



n-1 

E 

3=0 

x n — 1 £> „ , ,n-l 

x p (B + P + K) + C-(P + K) + 1 

n n n 

n-1 



g a = $^P„(j'KC?) 



j=0 
n-1 

- E 

n — 1 

= x p (B - K - P)—— + , (18) 

where a: p is the global proportion of pro-social punishers. Thus, punishment increases in 
proportion in the global population when 



Cn-B 
2(n-l)(P + K) ' 2' 



x P >x;= „,„ 1WD , ^ + 7T- (19) 



This has a horizontal asymptote at x* = ^(p+K) ■ The critical initial frequency for 
pro-social punishment to be selected is plotted with respect to group size in Figure lldl 

3.1.5. Pro-social punishment vs. cooperation, anti-social punishment vs. defection, and 
the second-order free-riding problem 
In the absence of other strategies, pro-social punishment is neutral with non-punishing 
cooperate. This is because in such a case there are no individuals to be punished, and 
hence the total cost of punishing is zero. Likewise, anti-social punishment is neutral with 
non-punishing defection in the absence of other strategies. However, all four strategies 
may always be present in above-zero frequency in a population due to mutation. In 
such cases the effects of pro-social punishment, in terms of reducing the proportion of 
defectors and anti-social punishers in the group, are shared equally by both pro-social 
punishers and non-punishing cooperators. Non-punishing cooperators, however, would 
not pay the cost of p unishing and so would be expected to be fitter (the second-order 
public goods problem, Colmanl . 20061 : Eldakar and Wilson . 2008h . Similarly the effects of 



anti-social punishment, in terms of reducing the frequency of competing individuals with 
the cooperative trait, are felt equally by both anti-social punishers and non-punishing 
defectors in the same group. Non-punishing defectors, however, do not pay the cost of 
punishment. Thus, since in this model neither type of punishment differentially benefits 
punishers within a group, we might expect both types of punisher to be replaced by their 
non-punishing counterparts. Hence, we would not expect either type of punishment to 
be evolutionarily stable when non-punishing mutants are introduced. We investigate this 
through simulation in the next section. 
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3.2. Simulation results 

For the primary simulation results below, we fix the migrant pool size at N = 500, the 
benefit of cooperation at B = 0.9, the cost of cooperation at C — 0.1, the cost of being 
punished at P = 0.5, the cost to the punisher at K = 0.1, and the mutation rate at 
/i = 0.01. We vary the group size, n, and the number of generations before dispersal, T. 
We thus focus on the demographic factors that affect selection on punishment. We return 
to examining parameter sensitivity after presenting the primary results fSection l3.3p . 

As the analytic results above show, outcomes are sensitive to initial conditions, since 
selection pressures depend upon the frequencies of all four strategies (Equations EHH]) ■ 
As a consequence, although a type may not be likely to invade when rare, it may be 
maintained or increase in frequency by selection when established at sufficient frequency. 
In this study, we focus on the initial condition where all four strategies are present in 
equal frequency. We address the validity of this assumption in the Discussion (Section[4]). 
As we show below, from this state pro-social punishment and cooperation will be selected 
against in a well-mixed population. We therefore investigate how the presence of various 
types of group structure changes selection pressures on the maintenance of both types of 
punishment. 

3.2.1. Well-mixed population 

We first investigate the effects of mutation in a well-mixed population. In the present 
model, we do this by setting n = N = 500 and T = 1. With all four strategies started in 
equal frequency, both cooperate and pro-social punishment are driven extinct, apart from 
reintroduction by recurring mutation (fig.[2aj). If anti-social punishment was neutral with 
non-punishing defect, we would expect both types to reach a proportion of approximately 
50% under mutation. However, we see that anti-social punishment is in fact weakly 
selected against, being held at a frequency of around 40%. This is because anti-social 
punishers are paying the cost of punishing the few cooperators and pro-social punishers 
that are being reintroduced by mutation. The total cost of these acts of punishment 
is not large, however, because there are few individuals to punish and cooperators are 
selected against in a well-mixed population even without anti-social punishment (see 
analytical results, Section ^. 1.11) . 

We next consider the case where pro-social punishment is initially fixed in the popu- 
lation. If only defectors (and anti-social punishers) arose by mutation, then pro-social 
punishment would be stable. However, we also allow for the possibility of non-punishing 
cooperators. In this situation, non-punishing cooperator mutants increase in frequency 
towards 50% (fig. I2b[) . as would be expected if pro-social punishment was neutral with 
non-punishing cooperation. However, the resulting decline in the frequency of pro-social 
punishment under mutation creates a second-order Tragedy of the Commons. That is, 
as pro-social punishment drops in frequency acts of punishment become too rare to pre- 
vent defection from being favoured; the condition for punishment to prevent defection 
in a well- mixed population (Px p > Kxa + C) is then no longer met. The increase in 
non-punishing cooperators thus creates a selective environment that favours defection. 
This illustrates that pro-social punishment need not be stable in a well-mixed population 
when non-punishing mutants are introduced. We also note that this result holds even 
in the absence of anti-social punishment. Anti-social punishment thus plays no selective 
role in a well-mixed population, and is maintained at a frequency below that expected 
under neutrality with non-punishing defect. 
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Figure 2: Simulation results of competition between pro-social punishers, anti-social pun- 
ishcrs, non-punishing cooperators, and non-punishing defectors in a well-mixed population 
with mutation, a) All types initially present in equal frequency, showing the selective 
advantage of defection from this initial frequency, b) Pro-social punishment initially at 
fixation. Although stable as a strategy against invasion by defectors only (not shown), 
the increase in non-punishing cooperator mutants creates a second-order Tragedy of the 
Commons in which first pro-social punishment, and subsequently cooperation, collapses. 



3.2.2. Effect of group size 

We now consider the effects of group structure on the selective dynamics. We first 
vary group size, holding T = 1. With all four strategies started in equal frequency, we 
find that cooperation and pro-social punishment fix in the population for group sizes 
n < B/C. With B = 0.1 and C = 0.9, in our model this is a group size less than 
9. This is the classic result for randomly formed, single generational groups, without 
punishment (see analytical results). Thus, in this case allowing both cooperators and 
defectors to punish gives the same outcome as if neither type could punish. However, 
if the option of pro-social punishment is removed (by replacing all pro-social punishers 
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with non-punishing cooperators in the initial condition), but anti-social punishment is 
maintained, we find that the condition for cooperation to evolve is more stringent than 
n < B/C. Specifically, we found that cooperation only evolved for a group size less than 
7. On the other hand, if anti-social punishment is removed from the initial condition (by 
replacing all anti-social punishers with non-punishing defectors), then cooperation and 
pro-social punishment can evolve even for group sizes where n > B / C . In terms of the 
parameters used in this study, pro-social punishment and cooperation reliably evolved in 
group sizes below 14, i.e. above the threshold of B/C = n = 9. 

Importantly, when n > B/C then a whole-group trait tha t provid e s a fix ed amount of 
benefit switches from being weakl y to stron g ly alt ruistic ([Pepper . l200Ch . or becomes 
"altruistic" in the sense u sed by Hamilton dl964 ) and more recently advocated by 
Lehmann and K eller (2006) and lWest et al.l (|2007 K Such cooperation, when considered 



outside the context of punishment, causes an absolute reduction in the lifetime fitness of 
the actor. This illustrates that pro-social punishment can allow a cooperative trait that 
would otherwise be strongly altruistic to evolve in randomly formed groups. This oc- 
curs because pro-social punishment modifies the direct costs and benefits of cooperation , 



so th at it is effectively no longer altruistic ( Bovd and Richersonl 1992 : Lehmann et al 



2007). 



We also find that pro-social punishment and non-punishing cooperation form a stable 
polymorphic equilibrium in such cases, even in the presence of recurrent mutations. That 
is, the second-order Tragedy of the Commons does not occur in such group-structured 
populations. The reason for this, and the maintenance of pro-social punishment and 
cooperation even when n > B/C, is due to the fact that the population structure provides 
localised interactions. Specifically, mean fitness is lower in groups containing a greater 
proportion of defectors (Equations [3H5]). Thus after dispersal such groups will make up 
a smaller fraction of the migrant pool, since the groups will contain fewer individuals 
relative to other groups after one iteration of Equation [7] For this to occur, different 
groups must contain different proportions of defect ors when they are formed. That is, 
there must be variance in the composition of groups ([Wilsonl . 119751 ) . Since the groups are 
formed by random sampling, a smaller group size provides greater variance. Thus, smaller 
group sizes provide stronger selection against defection and anti-social punishment. 

The results of this section illustrate that either type of punishment can prevent the evo- 
lution of a behaviour which would otherwise be selected given the population structure. 
They also illustrate that when both types of punishment are started in equal frequency 
their effects can be canceled out, with the population structure again becoming the de- 
terminant of selection on cooperation. 



3.2.3. Effect of dispersal frequency 

We next investigate varying the number of generations between dispersal episodes, 
T. We first consider the case in which anti-social punishment is removed, giving initial 
starting frequencies of 25% pro-social punishers, 25% non-punishing cooperators, and 
50% non-punishing defectors. Figure [5a] shows the effect of T on the largest group size 
for which the equilibrium is polymorphic for pro-social punishment and non-punishing 
cooperation, i.e. where defection is removed by selection. We took a simulation run to 
reach this equilibrium when the global proportion of pro-social punishers and cooperators 
exceeded 75% after 1000 cycles. The shading in the figure indicates the number of 
simulation runs, out of 100, which reached this equilibrium. 
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Figure 3: Effect of varying both group size and dispersal frequency. The shading indi- 
cates the number of simulation runs, out of 100, in which selection favoured pro-social 
punishment and cooperation after 1000 cycles, a) Without anti-social punishment; b) 
with anti-social punishment. Smaller group size increases variation in group composition, 
thus providing stronger selection against defection. Less frequent dispersal favours pro- 
social punishment by creating an equilibrium selection process. However, the addition of 
anti-social punishment reduces the largest group size under which pro-social punishment 
and cooperation are selected. Initial conditions for this simulation are given in the text. 



The results show that the largest group size for which pro-social punishment and 
cooperation are selected increases with T. At first sight this result is surprising, since in 

15 



classic Haystack models witho ut punishment a large T eliminates cooperation dWilsonl . 



1987t ISober and Wilsonl . 119981 ). This is because in such models there is only a single 



equilibrium within-groups: all defect. Thus, in the limit of T approaching infinity any 
group founded by at least one defector will be converted to all defect. Frequent dispersal 
can, though, k eep the population out of this equilibrium and allow cooperation to be 
stable globally ( Sober and Wilsonl . 1998 ). However, the addition of pro-social punishment 
to such models means that groups founded by one or more defectors need not, for high 
T, be converted to all defect. 

For illustration, consider groups consisting of pro-social punishers and non-punishing 
defectors. Pro-social punishment is then a stable equilibrium within the group whenever 
the proportion of punishers causes the condition Px p > Kxd + C to be satisfied. Since a 
single group constitutes a well-mixed population, i.e. each group member interacts with 
all other group members with respect to public goods, this occurs above the limit of pjrg 
(see analytical results). Under the parameters used in the simulations, this is when the 
fraction of pro-social punishers when the group is founded is greater than 1/3. Groups 
founded by more than this proportion of pro-social punishers will thus be stable against 
defection. Thus, a high T need not cause an increase in defection within each group. 
Moreover, individuals in groups at the pro-social punishment equilibrium have a higher 
mean fitness, due to the benefits of cooperation (Equations [HO . The more generations 
the group stays together for, the greater the cumulative total of this benefit compared 
to non-cooperation groups, since groups grow at an exponential rate (with no negative 
density-dependent effects) in accordance with Equation [7] Consequently, when dispersal 
eventually occurs groups at the pro-social punishment equilibrium will have grown to a 
larger size, and will hence make up a larger fraction of the migrant pool. 

A larger T can thus favour pro-social punishment, due to the reproductive advantage 
that individuals in groups at the pro-social punishment equilibrium enjoy with each gen- 
eration that the group stays together. This mechanism depends upon some groups being 
founded with type frequencies that fall within the basin of attraction for the pro-social 
punishment equilibrium, i.e. with x p > 1/3. This can occur in a group-structured popu- 
lation even when the global value of x v is below this (i.e. 0.25 as used in the simulations), 
provided there is variance in the composition of groups when they are founded. In our 
model, this variance is provided through the formation of groups by hypergeometric (ran- 
dom) sampling of individuals from the migrant pool. This variance decreases, however, 
as the founding size of the groups becomes larger. Thus for larger group sizes, fewer 
groups lie in the basin of attraction for the pro-social punishment equilibrium. This 
explains why for large group sizes, a large T may not be sufficient to select for pro-social 
punishment, because too few (or no) groups may fall in its basin of attraction. 

Finally, we consider the effects of varying T as well as group size when anti-social 
punishment is added to the model. In Figure l3bl we use the same parameters as in Fig- 
ure (3a] but here we reintroduce anti-social punishment, returning to the initial condition 
in which all four strategies are at 25%. In this case we see that pro-social punishment and 
cooperation are selected over a much smaller range of group sizes for a given dispersal 
frequency. As T is increased, the increase in the largest group size under which pro-social 
punishment and cooperation are selected is also less pronounced. 

The reason for this result is that the basin boundary for the pro-social punishment 
equilibrium changes within a group. In particular, from the analytical results we know 
that in a single group founded by pro-social punishers and anti-social punishers only, the 
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basin boundary lies at x p = 1 — 2{p+K) ' wn i cn i s a more stringent condition than the 
population without anti-social punishment (^iff )• Under the parameters used in the 
simulation, this corresponds to x p > 1/3 without anti-social punishment, and x p > 5/12 
in the presence of anti-social punishment. We verified numerically that this result holds 
in the four-type system when the three other types are all started in equal proportion. 
Thus for a given frequency of pro-social punishment in the migrant pool, fewer groups 
are expected to fall in the basin of attraction for the pro-social punishment equilibrium 
when some of the non-punishing defectors are replaced with anti-social punishers. The 
presence of anti-social punishment therefore means that a larger between-group variance 
(relative to the migrant pool value of x p ) is required to select for pro-social punishment. 



3.3. Sensitivity to the effect-to-cost ratio of cooperation and punishment 



The benefit-to-cost ratio of the cooperative act affects the largest group size in which 
cooperation can evolve. Without either type of punishment, when T = 1 then coop- 
eration is selec tively advantageous when B/n > C; this is the well-known result from 
Wilson's (|1975f ) trait-group model when groups are formed randomly. When both pro- 
and anti-social punishment are available, and all four strategies are initially present in 
the population at equal frequency, then our simulations shows that the effects of both 
type of punishment on the level of cooperation cancel out. In other words, the condition 
for cooperation to be favoured is still B/n > C when both types of punishment are added 
and T = 1. We find this result to be insensitive to the effect-to-cost ratio of punishment; 
it holds in the simulations even for P = K = 0.1, as well as for P > K. 

For T > 1, we find that the addition of anti-social punishment reduces the largest 
group size over which cooperation evolves, compared to the case where only pro-social 
punishment was available. This result is also qualitatively insensitive to the benefit-to- 
cost ratio of cooperation. That is, decreasing the benefit-to-cost ratio of cooperation 
decreases the largest group size in which cooperation evolves in the case where both 
anti- and pro-social punishment are present, and similarly in the case where only pro- 
social punishment is available. The result that anti-social punishment further reduces 
this over the pro-social punishment only case, however, still holds regardless of the B/C 
ratio when K = 0.1 and P = 0.5. Further, we found that increasing P (while holding K 
constant) increases the magnitude of this effect. That is, the addition of anti-social pun- 
ishment makes a greater difference as P increases. As P decreases, however, both types 
of punishment have less effect and the difference between the two cases becomes smaller. 
When K = 0.1 and P < 0.4, we found that the addition of anti-social punishment made 
no difference to the range of group sizes over which cooperation evolves, compared to the 
case where only pro-social punishment is available. 

It should be noted th at P > K is a common assumption in both models of the evolu- 
tion of punishment (e.g. iBoyd et all 2003 . 20101 Bowles and Gintid l 2004) [Lchman n et al 



2007 ; dos Santos et al.ll201 lh . and in experimental public goods games ( Fehr et alT 20021 



Herrmann et al. . 20081 ). Nevertheless, measuring the actual cost-to-effect ratio of pun- 



ishment in situ in real populations is very difficult. In deed, some authors have explicitly 



considered the case where P = K (|Rand et all l2010l) . However, at least in the case of 



humans it is commonly held that the advent of tools from gossip to weaponry makes pun- 
ishm e nt very effective at little cost to the punisher ( Sober and Wilson . Il998 ; Bingham , 
Il999i : iBoehml . Il999t Binmore , 2005 ) . We have thus focused our study on cases where 
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punishment is reasonably efficient in terms of effect-to-cost ratio, whilst still being less 
efficient than the benefit-to-cost ratio of the cooperative act. 



4. Discussion and conclusion 



We have presented here, to our knowledge, the first model of the evolution of anti-social 
punishment in group-structured populations. Previous work on anti-social punishment 
has nevertheless suggested that group structure wo uld favour pro-social punishment and 
preve nt anti-social punishment from being effective ( Rand et al. . 2010t Rand and Nowak , 
20111 ). Further, much previous work on the evolution of pro-social punishment in group- 
struc tured populations has not even considered the possibility of anti-socia l punishment 
(e.g. iBovd et alJl2003ll20ld: [Gardner and Wesdl20u4 iLehmann et al.ll2007t) . presumably 
for the same reason. After all, group or kin selection should be expected to promote 
behaviours that support cooperation rather than defection, assuming a population struc- 



ture that pro vides positive relatedness at the locus for cooperation ([Hamilton! . 11964 ; 



Wilsonl . 1197.4 



In fact, we have shown here that anti-social punishment can be effective in the pre- 
venting the evolution of pro-social punishment and cooperation in group-structured pop- 
ulations. Models for the evolution of pro-social punishment typically rely both on pro- 
social punishment being a stable equilibrium within a group, and on some groups be- 
ing founded with initial strategy frequencies that fall within the basin of attraction for 
this equilibrium. When these two conditions are met, equilibrium selection between 
groups can occur dHarsanvi and Seltenl . 1988 : Bovd and RichersonL 19901 : Binmore . 1998L 
Canals and Vega-Redondol . [l998f) , such that groups at the pro-social punishment and co - 
operation equilibrium out-compete those at the defection equilibrium ( Bovd et al. . 20031) . 



We have shown here that the presence of anti-social punishers reduces the likelihood of 
the second condition being met, by reducing the basin of attraction for the pro-social 
punishment equilibrium, compared to a population where the anti-social punishers are 
replaced with non-punishing defectors. Consequently, a greater between-group variance 
in the frequency of pro-social punishment is required in order for some groups to fall in 
its basin of attraction, and equilibrium selection to occur. 

One way such a greater between-group variance can be achieved is through a reduc- 
tion in group size. However, such a requirement eliminates punishment as an explanation 
for the maintenance of cooperation in large human groups with low relatedness. Notice 
though that the results in this model assume that there is no structure to a society 
within a large group. However, internal structure within large groups may in fact be 
present e.g. due to social hierarchy or spatial distribution of public goods. Nevertheless, 
the homogeneity of within-group structur e is a standard assumption i n group-structured 
models of the evolution of cooperation ( Bovd and RichersonL 199Ct Bovd et all 2003 , 
2010[ IWilsonl . Il975i Il987t iHamiltonl , Il975t iTraulsen and Nowakl 120061: iLehmann et al , 
20071 ). and corresponds to a public good that is shared equally with all group mem- 
bers. Moreover, if the effective group size of social interactions is smaller then pro-social 
punishment may not be necessary to maintain cooperation anyway; direct and indirect 
fitness benefits from the cooperative act itself may be sufficient. We have thus focused 
here on cases where the public good is shared equally between all group members and 
hence pro-social punishment is necessary to maintain cooperation in large groups. For 
the same reason, we have also focused on linear public goods games. This is because in 
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non-linear public goods games, punish ment would not be needed to m aintain cooperation 
even in large randomly formed groups ( Archetti and Scheurind . l2012j) . Whether real-life 
social dilemmas are linear or non-linear is an empirical question that must be answered 
on a case by case basis. 

It is worth stressing that we have considered the maintenance of punishment, 
rather than its invasion from mutation frequency. It is already widely appreciated 
that punishment can be stable when common, e ven if it is not selected when rare 



(|Bovd and Richersonl Il992t iLehmann et all 120071 ). This is because in the standard 



model, the total individual cost of punishment decreases as punishers increase in fre- 
quency within a group (Equations [3H6]) . Thus, both pro- and anti-social punishment 
undergo positive frequency-dependent sel ection. In light of this, much work has focused 



on the maintenance of p unishment (e.g. iBovd and Richersonl Il992t iHenrich and Bovd 



2001t iGintis et all 12003). as we do in this study. Several mechanisms have, however, 



been suggested for the invasion of (pro-social) punishment from rarit y. These include 



kin b enefits resulting from punishing acts reducing local competition (jLehmann et al 



20071). the f i xation of punishment within a single group through stochastic processe s 
(Bovd et all [20031 ) . vol untary participation in social interactions (jHauert et all [20071 ). 



systems of reputation (Idos Santos et al 



haviour between individuals (|Bovd et al 



2011), or the coordination of punishing be- 
2010). It has been shown more generally that 



social traits which are maladaptive when rare, but advantageous once common, may 
be able to reach the threshold frequency for positive selection by drift-like processes 



( Bovd and Richerson . 1990; Bo vd et al. . 2003 ). One mechanism by which this may oc 



cur is when environmental factors result in population oscillations, and periods where 
the environment is temporarily below its maximum carrying capacity (jCace and Brvson , 
2007; lAlizon and Tavlorl 1200 



Similarly, future studies should investigate the proximate mechanisms by which anti- 
social punishment might be favoured over simple non-punishing defection within a single 
group. This is similar to the classic problem of how pr o-social punish ers may be favoured 
over non-punishing cooperators within a single group ( Colmanl . Il 



2006). Essentially, either 



type of punishment can be favoured if the effects of punishment are not shared equally 
with non-punishers in the same group, but instead feed disproportionately back to the 
actor or their kin. As mentioned earlier, it is difficult to imagine direct advantage from 
anti-social punishment, at least as described here and in the economics literature. How- 
ever, a linked consequence such as increased social status may serve as the explanatory 
benefit. 

In conclusion, we have shown here that the presence of anti-social punishers reduces 
the range of conditions over which pro-social punishment and cooperation are stable 
in group-structured populations. This occurs because anti-social punishment reduces 
the basin of attraction for the pro-social punishment equilibrium within groups. Thus, 
a given magnitude of between-group variance may no longer be sufficient to select for 
pro-social punishment. In particular, we have shown here how the range of group sizes 
over which pro-social punishment is selected can be greatly reduced by anti-social punish- 
ment. Given the e xistence of anti-social punishm ent in all studied extant human cultures 
( Herrmann et al. , l2008t ISvlwester etHl l201ll ). our results suggest that the claims of 
models showing the evolution of pro-social punishment in group-structured populations 
should be re-evaluated with the addition of anti-social punishment. 
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